Crate enum_dispatch

source ·
Expand description

enum_dispatch provides a set of macros that can be used to easily refactor dynamically dispatched trait accesses to improve their performance by up to 10x.

Accessing structures through dynamic dispatch is known to have a high runtime cost. Dynamic dispatch is traditionally used to hide unnecessary type information, improving encapsulation and making it trivial to add new implementations. However, this hiding of information means that each time a structure is dynamically accessed, the program must perform a lookup of the type’s information in a virtual table. The extra round-trips to the vtable quickly add up.

In Rust, dynamic dispatch is done using traits. Rust 2018 adds the impl and dyn keywords to make it easier to keep track of instances of dynamic dispatch, but it’s not always easy to avoid it entirely.

§Feature documentation

For full documentation of features like generic support, custom variant names, and more, please check the repository’s README.

§How it works

Observe the following example of code describing a user interface with knobs. Each knob can hold a value between 0.0 and 1.0. Some knobs provide a linear range, whereas other knobs provide a logarithmic range.

trait KnobControl {
    fn set_position(&mut self, value: f64);
    fn get_value(&self) -> f64;
}

struct LinearKnob {
    position: f64,
}

struct LogarithmicKnob {
    position: f64,
}

impl KnobControl for LinearKnob {
    fn set_position(&mut self, value: f64) {
        self.position = value;
    }

    fn get_value(&self) -> f64 {
        self.position
    }
}

impl KnobControl for LogarithmicKnob {
    fn set_position(&mut self, value: f64) {
        self.position = value;
    }

    fn get_value(&self) -> f64 {
        (self.position + 1.).log2()
    }
}

fn main() {
    let v: Vec<Box<dyn KnobControl>> = vec![
        //set the knobs
    ];

    //use the knobs
}

There are other ways to keep an arbitrarily ordered list of different knob types, but none of them are quite as simple or easy to maintain. Unfortunately, this implementation uses both heap allocated Boxes and dynamic dispatch, which will have performance implications.

One alternative is to use introduce a new enum type that can hold either a LinearKnob or a LogarithmicKnob as a variant, and also implements KnobControl by matching on itself and delegating calls to its variants. This would look like the following:

enum Knob {
    Linear(LinearKnob),
    Logarithmic(LogarithmicKnob),
}

impl KnobControl for Knob {
    fn set_position(&mut self, value: f64) {
        match self {
            Knob::Linear(inner_knob) => inner_knob.set_position(value),
            Knob::Logarithmic(inner_knob) => inner_knob.set_position(value),
        }
    }

    fn get_value(&self) -> f64 {
        match self {
            Knob::Linear(inner_knob) => inner_knob.get_value(),
            Knob::Logarithmic(inner_knob) => inner_knob.get_value(),
        }
    }
}

Performance with this implementation is significantly improved, since all the information the program could possibly need to know about each knob can be deduced at compile time. Besides avoiding heap allocations and vtable lookups, this allows the compiler to squeeze out even more optimization through function inlining.

However, it’s easy to see that the cost of maintaining the source code for this extra structure is quite high. What happens when we add more knob types? What happens when we add more trait methods? Even worse, what happens when we do both!

The resulting code is very repetitive, but that makes it a great target for automatic generation. The enum_dispatch macro can do the automatic generation for you. Examine the code to generate the same implementation when using enum_dispatch.

#[enum_dispatch]
trait KnobControl {
    //...
}

#[enum_dispatch(KnobControl)]
enum Knob {
    LinearKnob,
    LogarithmicKnob,
}

That’s it. enum_dispatch will also automatically generate implementations of std::convert::From for each enum variant, so that new Knobs can be created without concern for the names of each enum variant.

let (a_linear_knob, a_logarithmic_knob) = some_existing_knobs();

let knob = Knob::from(a_linear_knob);
let knob = Knob::from(a_logarithmic_knob);

§Performance

The benches directory contains three benchmarks of different natures, each comparing four different methods of accessing a traited struct of an arbitrary type. The four methods are as follows:

test nameexplanation
boxdynThe easiest way to access a struct, using a heap allocation and dynamic dispatch.
refdynAccesses the struct by reference, but still using dynamic dispatch. No heap allocation.
customderiveUses a similar macro approach from the external enum_derive crate, which implements a method that returns an inner type as a dynamic trait object.
enumdispatchImplemented using this crate.

§The benchmarks

The following benchmark results were measured on a Ryzen 7 2700x CPU.

§compiler_optimized

The first set of benchmarks creates trait objects and measures the speed of accessing a method on them.

test benches::boxdyn_compiler_optimized       ... bench:   2,135,418 ns/iter (+/- 12,575)
test benches::customderive_compiler_optimized ... bench:   2,611,860 ns/iter (+/- 18,644)
test benches::enumdispatch_compiler_optimized ... bench:           0 ns/iter (+/- 0)
test benches::refdyn_compiler_optimized       ... bench:   2,132,591 ns/iter (+/- 22,114)

It’s easy to see that enum_dispatch is the clear winner here!

Ok, fine. This wasn’t a fair test. The compiler is able to “look through” the trait method call in the enum_dispatch case, notices that the result is unused, and removes it as an optimization. However, this still highlights an important property of enum_dispatched types: the compiler is able to infer much better optimizations when possible.

§blackbox

The next set of benchmarks uses the test::black_box method to hide the fact that the result of the method is unused.

test benches::boxdyn_blackbox       ... bench:   2,131,736 ns/iter (+/- 24,937)
test benches::customderive_blackbox ... bench:   2,611,721 ns/iter (+/- 23,502)
test benches::enumdispatch_blackbox ... bench:     471,740 ns/iter (+/- 1,439)
test benches::refdyn_blackbox       ... bench:   2,131,978 ns/iter (+/- 21,547)

The competitors faced virtually no impact, whereas enum_dispatch takes the full force of the black_box call. This test shows the power that avoiding dynamic dispatch gives to the compiler in the context of the previous test, but also demonstrates how much faster enum_dispatch is in real code: almost 5 times faster than the closest alternative.

§homogenous_vec

The final set of benchmarks puts 1024 traited structs of arbitrary types at random into a Vec and measures the time it takes to successively iterate over the entire Vec, calling black_boxed methods on each element.

test benches::boxdyn_homogeneous_vec       ... bench:   5,900,191 ns/iter (+/- 95,169)
test benches::customderive_homogeneous_vec ... bench:   4,831,808 ns/iter (+/- 140,437)
test benches::enumdispatch_homogeneous_vec ... bench:     479,630 ns/iter (+/- 3,531)
test benches::refdyn_homogeneous_vec       ... bench:   5,658,461 ns/iter (+/- 137,128)

This might be one of the most likely use cases for traited structs of arbitrary types, and it’s where enum_dispatch really shines. Since a Vec of enum_dispatch objects is actually a Vec of enums rather than addresses, accessing an element takes half the indirection of the other techniques. Add that to the lack of vtable accesses, and we have a result that is 10 times faster than the closest alternative, and almost 12 times faster than the best technique from the standard library.

Attribute Macros§

  • Annotating a trait or enum definition with an #[enum_dispatch] attribute will register it with the enum_dispatch library, allowing it to be used to generate impl blocks elsewhere.